An Effective Combination of Different Order N-grams
نویسندگان
چکیده
In this paper an approach is proposed to combine different order N-grams based on the discriminative estimation criterion, on which the parameters of n-gram can be optimized. To raise the power of modeling language information, we propose several schemes to combine conventional different order n-gram language model. We employ Newton Gradient method to estimate the assumption probabilities and then test the optimally selected language model. We conduct experiments on the platform of conversion from Chinese pinyin to Chinese character. The experimental results show that the memory capacity of language model can be remarkably lowered with hide loss of accuracy.
منابع مشابه
Comparison of Character n-grams and Lexical Features on Author, Gender, and Language Variety Identification on the Same Spanish News Corpus
We compare the performance of character n-gram features (n = 3–8) and lexical features (unigrams and bigrams of words), as well as their combinations, on the tasks of authorship attribution, author profiling, and discriminating between similar languages. We developed a single multi-labeled corpus for the three aforementioned tasks, composed of news articles in different varieties of Spanish. We...
متن کاملMIRACLE's Hybrid Approach to Bilingual and Monolingual Information Retrieval
The main goal of the bilingual and monolingual participation of the MIRACLE team in CLEF 2004 was to test the effect of combination approaches on information retrieval. The starting point was a set of basic components: stemming, transformation, filtering, generation of n-grams, weighting and relevance feedback. Some of these basic components were used in different combinations and order of appl...
متن کاملEnsemble classifier for Twitter sentiment analysis
In this paper, we present a combination of different types of sentiment analysis approaches in order to improve the individual performance of them. These ones consist of (I) ranking algorithms for scoring sentiment features as bi-grams and skip-grams extracted from annotated corpora; (II) a polarity classifier based on a deep learning algorithm; and (III) a semi-supervised system founded on the...
متن کاملFrom Characters to Words to in Between: Do We Capture Morphology?
Words can be represented by composing the representations of subword units such as word segments, characters, and/or character n-grams. While such representations are effective and may capture the morphological regularities of words, they have not been systematically compared, and it is not understood how they interact with different morphological typologies. On a language modeling task, we pre...
متن کاملCombination of different n-grams based on their different assumptions
This paper addresse the negative impact of assumptions artificially introduced from different ngram on its performance in natural language processing. To raise the power of modeling language information, we propose several schemes to combine conventional different order n-gram language model together by introducing probabilities of assumption. The assumption probabilities are estimated on the b...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2003